Harvesting Dutch Trees: Syntactic Properties of Spoken Dutch

نویسندگان

  • Ton van der Wouden
  • Ineke Schuurman
  • Machteld Schouppe
  • Heleen Hoekstra
چکیده

In this paper, we report on quantitative research into certain word order phenomena in Dutch. In our research, we use the Spoken Dutch Corpus (CGN), a major new resource for research into contemporary spoken Dutch. After briefly introducing the primary data, the annotations added, and some of the tools to explore the primary data and the annotations, we illustrate how the Corpus may be utilized to answer certain linguistic questions concerning the Dutch language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Analysis in the Spoken Dutch Corpus (CGN)

The paper describes the syntactic annotation of the Spoken Dutch Corpus (“Corpus Gesproken Nederlands” or CGN), the Dutch-Flemish project (1998-2003) aiming at the collection, description and annotation of ten million words of spoken Dutch. In the first part, the background of the parsing strategy is discussed, as well as some details concerning the actual implementation of the parsing process....

متن کامل

Study on two species of Ophiostoma in relation with Dutch elm disease in Iran

An investigation was carried out in some areas of Golestan Province including: Loveh forest, Soosara, Daland forest park, Tooskestan; Gilan Province including Siahkal and Asalem forests; Arasbaran and landscape of urban trees during 1999–2007. In this investigation, based on some morphological, physiological and molecular characteristics and also comparison with standard isolates two species Op...

متن کامل

Belgian Standard Dutch

Dutch is a language spoken by about 20 million people in the Netherlands and Belgium. This region is not only characterised by a complex dialect situation, but also by the use of two institutionalised varieties of the Standard language: Netherlandic Dutch is spoken in the Netherlands and is documented in Collins & Mees (1982), Mees & Collins (1983) and Gussenhoven (1999), while Belgian Dutch is...

متن کامل

Spontaneous Speech in the Spoken Dutch Corpus

In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a corpus of 1,000 hours of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for research in the fields of (computational) linguistics and language and speech technology. Although the corpus will contain a fair amount of read speech...

متن کامل

Syntactic Annotation for the Spoken Dutch Corpus Project (CGN)

Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002